Towards a Generic Approach for Schema Matcher Selection: Leveraging User Pre- and Post-match Effort for Improving Quality and Time Performance JURY
نویسندگان
چکیده
Towards a Generic Approach for Schema Matcher Selection: Leveraging User Preand Post-match Effort for Improving Quality and Time Performance Interoperability between applications or bridges between data sources are required to allow optimal information exchanges. Yet, some processes needed to bring this integration cannot be fully automatized due to their complexity. One of these processes is called matching and it has now been studied for years. It aims at discovering semantic correspondences between data sources elements and is still largely performed manually. Thus, deploying large data sharing systems requires the (semi-)automatization of this matching process. Many schema matching tools were designed to discover mappings between schemas. However, some of these tools intend to fulfill matching tasks with specific criteria, like a large scale scenario or the discovery of complex mappings. And contrary to ontology alignment research field, there is no common platform to evaluate them. The abundance of schema matching tools, added to the two previously mentioned issues, does not facilitate the choice, by an user, of the most appropriate tool to match a given scenario. In this dissertation, our first contribution deals with a benchmark, XBenchMatch, to evaluate schema matching tools. It consists of several schema matching scenarios, which features one or more criteria. Besides, we have designed new measures to evaluate the quality of integrated schemas and the user post-match effort. This study and analysis of existing matching tools enables a better understanding of the matching process. Without external resources, most matching tools are mainly not able to detect a mapping between elements with totally dissimilar labels. On the contrary, they cannot infirm a mapping between elements with similar labels. Our second contribution, BMatch, is a matching tool which includes a structural similarity measure and it aims at solving these issues by only using the schema structure. Terminological measures enable the discovery of mappings whose schema elements share similar labels. Conversely, structural measures, based on cosine measure, detects mappings when schema elements have the same neighbourhood. BMatch’s second aspect aims at improving the time performance by using an indexing structure, the B-tree, to accelerate the schema matching process. We empirically demonstrate the benefits and the limits of our approach. Like most schema matching tools, BMatch uses an aggregation function to combine similarity values, thus implying several drawbacks in terms of quality and performance. Tuning the parameters is another burden for the user. To tackle these issues, MatchPlanner introduces a new method to combine similarity measures by relying on decision trees. As decision trees can be learned, parameters are automatically tuned and similarity measures are only computed when necessary. We show that our approach provides an increase in terms of matching quality and better time performance with regards to other matching te l-0 04 36 54 7, v er si on 1 27 N ov 2 00 9 tools. We also present the possibility to let users choose a preference between precision and recall. Even with tuning capabilities, schema matching tools are still not generic enough to provide acceptable quality results for most schema matching scenarios. We finally extend MatchPlanner by proposing a factory of schema matchers, named YAM (for Yet Another Matcher). This tool brings more flexibility since it generates an ’a la carte’ matcher for a given schema matching scenario. Indeed, schema matchers can be seen as machine learning classifiers since they classify pairs of schema elements either as relevant or irrelevant. Thus, the best matcher in terms of matching quality is built and selected from a set of different classifiers. We also show impact on the quality when user provides some inputs, namely a list of expert mappings and a preference between precision and recall.
منابع مشابه
An Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملImprovement of effort estimation accuracy in software projects using a feature selection approach
In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...
متن کاملThe Effectiveness of the Schema Therapy Approach on Quality of Life and Self-Criticism in Women with Suicidal Thoughts:A Quasi-Experimental Study
Background and Objectives: Women are victims of chaotic family conditions, social damage, and structural poverty. Hasty actions, women's ignorance and lack of attention to the consequences of the decision, despite the intense pressure on them, have provided the basis for suicide. The aim of the present study was to determine the effectiveness of schema therapy on quality of life and self-critic...
متن کاملResults of GeRoMeSuite for OAEI 2008
GeRoMeSuite is a generic model management system which provides several functions for managing complex data models, such as schema integration, definition and execution of schema mappings, model transformation, and matching. The system uses the generic metamodel GeRoMe for representing models, and because of this, it is able to deal with models in various modeling languages such as XML Schema, ...
متن کاملDesigning a Benchmark for the Assessment of Schema Matching Tools
Over the years, many schema matching approaches have been developed to discover correspondences between schemas. Although this task is crucial in data integration, its evaluation, both in terms of matching quality and time performance, is still manually performed. Indeed, there is no common platform which gathers a collection of schema matching datasets to fulfil this goal. Another problem deal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009